Clustering Curves in the Presence of Heteroscedastic Errors

نویسنده

  • Nicoleta Serban
چکیده

The clustering technique introduced in this paper is a means for discovering underlying patterns among a large number of curves. One novel characteristic compared to the current clustering methods is that we allow for heteroscedastic errors. Both the mean and the variance functions of each curve are assumed unknown and varying over time. The clustering method consists of a series of steps: transformation using an orthonormal basis of functions in L2, dimension reduction through coefficient estimation in the transform domain, and clustering in the transform space. We show that in the transform space, the coefficient estimation procedure introduced in this paper is asymptotically optimal in the Pinsker’s minimax sense over Sobolev ellipsoids. We illustrate our technique by clustering a large number of curves both within a synthetic example and within a specific application. In this application, we analyze the research and development expenditure over time of a subset of companies in the Compustat Global database. We show that our method compares favorably to two alternative approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating and Clustering Curves in the Presence of Heteroscedastic Errors

The technique introduced in this paper is a means for estimating and discovering underlying patterns for a large number of curves observed with heteroscedastic errors. Therefore, both the mean and the variance functions of each curve are assumed unknown and varying over time. The method consists of a series of steps. We transform using an orthonormal basis of functions in L2. In the transform d...

متن کامل

Magnetic Calibration of Three-Axis Strapdown Magnetometers for Applications in Mems Attitude-Heading Reference Systems

In a strapdown magnetic compass, heading angle is estimated using the Earth's magnetic field measured by Three-Axis Magnetometers (TAM). However, due to several inevitable errors in the magnetic system, such as sensitivity errors, non-orthogonal and misalignment errors, hard iron and soft iron errors, measurement noises and local magnetic fields, there are large error between the magnetometers'...

متن کامل

Wavelet designs for estimating nonparametric curves with heteroscedastic error 3

In this paper, we discuss the problem of constructing designs in order to maximize the accuracy 9 of nonparametric curve estimation in the possible presence of heteroscedastic errors. Our approach is to exploit the 3exibility of wavelet approximations to approximate the unknown response 11 curve by its wavelet expansion thereby eliminating the mathematical di5culty associated with the unknown s...

متن کامل

Determination of the Best Hierarchical Clustering Method for Regional Analysis of Base Flow Index in Kerman Province Catchments

The lack of complete coverage of hydrological data forces hydrologists to use the homogenization methods in regional analysis. In this research, in order to choose the best Hierarchical clustering method for regional analysis, base flow and related index were extracted from daily stream flow data using two parameter recursive digital filters in 43 hydrometric stations of the Kerman province. Ph...

متن کامل

Estimating smooth distribution function in the presence of heteroscedastic measurement errors

Measurement error occurs in many biomedical fields. The challenges arise when errors are heteroscedastic since we literally have only one observation for each error distribution. This paper concerns the estimation of smooth distribution function when data are contaminated with heteroscedastic errors. We study two types of methods to recover the unknown distribution function: a Fourier-type deco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006